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Abstract 

We consider estimation procedures which are recursive in the 
sense that each successive estimator is obtained from the previous 
one by a simple adjustment. The model considered in the paper is 
very general as we do not impose any preliminary restrictions on the 
probabilistic nature of the observation process and cover a wide class 
of nonlinear recursive procedures. In this paper we study asymptotic 
behaviour of the recursive estimators. The results of the paper can 
be used to determine the form of a recursive procedure which is ex- 
pected to have the same asymptotic properties as the corresponding 
non-recursive one defined as a solution of the corresponding estimat- 
ing equation. 

Keywords: recursive estimation, estimating equations, stochastic 

approximation. 

1 Introduction 

Let Xi, . . . ,X„ be independent identically distributed (i.i.d.) random vari- 
ables (r.v.'s) with a common distribution function Fg with a real unknown 
parameter 6. An M-estimator of 6 is defined as a statistic On = 6'„(Ai, . . . , A„), 
which is a solution w.r.t. v of the estimating equation 

n 

(1.1) J]^(A,;^;)=0, 

where if) is a, suitably chosen function. For example, if 6' is a location parame- 
ter in the normal family of distribution functions, the choice ip{x, v) = x — v 
gives the MLE (maximum likelihood estimator). For the same problem, 
if ip{x,v) = sign(x — v), the solution of (11. ip reduces to the median of 
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Xi, . . . , Xn- In general, if f{x,6) is the probability density function (or 
probability function) oi Fg{x) (w.r.t. a a-finite measure /i) then the choice 
il^lxyv) = f'{x,v)/f{x,v) yields the MLE. 

Suppose now that Xi, . . . , Xn are not necessarily independent or identi- 
cally distributed r.v's, with a joint distribution depending on a real param- 
eter 6. Then an M-estimator of 6 is defined as a solution of the estimating 
equation 

n 

(1.2) Y,Uv)=0, 

1=1 

where ipi{v) = ipi{Xl_j^; v) with = (Xj_fc, . . . ,Xj). So, the ■^/'-functions 
may now depend on the past observations as well. For instance, if Xj's 
are observations from a discrete time Markov process, then one can as- 
sume that k = 1. In general, if no restrictions are placed on the depen- 
dence structure of the process Xj, one may need to consider 7/^-functions 
depending on the vector of all past and present observations of the pro- 
cess (that is, = i — 1). If the conditional probability density function 
(or probability function) of the observation Xj, given . . . , is 

fi{x, 6) = fi{x, 9\Xi^k, ■ ■ ■ , Xi^i), then one can obtain the MLE on choos- 
ing 'ipi{v) = fl{Xi,v)/ fi{Xi,v). Besides MLEs, the class of M-estimators 
includes estimators with special properties such as robustness. Under cer- 
tain regularity and ergodicity conditions, it can be proved that there exists 
a consistent sequence of solutions of (11.21) which has the property of local 
asymptotic linearity. (A comprehensive bibliography can be found in, e.g., 
Hampel at al (1986) and Rieder (1994).) 

If V'-functions are nonlinear, it is rather difficult to work with the corre- 
sponding estimating equations, especially if for every sample size n (when 
new data are acquired), an estimator has to be computed afresh. In this pa- 
per we consider estimation procedures which are recursive in the sense that 
each successive estimator is obtained from the previous one by a simple ad- 
justment. Note that for a linear estimator, e.g., for the sample mean, 6n = 
Xn we have Xn = {n — l)Xn-i/n + Xn/n, that is 6'„ = 9n-i{n — l)/n + Xn/n, 
indicating that the estimator 6n at each step n can be obtained recursively 
using the estimator at the previous step 6n-i and the new information Xn- 
Such an exact recursive relation may not hold for nonlinear estimators (see, 
e.g., the case of the median). 

In general, the following heuristic argument can be used to establish a 
possible form of an approximate recursive relation (see also Jureckova and 
Sen (1996), Khas'minskii and Nevelson (1972), Lazrieva and Toronjadze 
(1987)). Since 9n is defined as a root of the estimating equation (11. 2p . 
denoting the left hand side of (11. 2p by M„(f) we have M„(6'„) = and 
Mn-i{9n-i) = 0. Assuming that the difference 9n — On-i is "small" we can 
write 

= MniOn) - M„_i(^„_i) = Mn (^„-l + - ^„-l)) " M„_i(^„_i) 
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Therefore, 

fn ~ t7„_i -, 

where M'^{6) = Y17=i V'i(^)- Now, depending on the nature of the underlying 
model, M'^{9) can be replaced by a simpler expression. For instance, in i.i.d. 
models with ipi^x^v) = f'{x,v)/f{x,v) (the MLE case), by the strong law 
of large numbers, 

M' (ft) 1 ^ 

= -J2 (fix., ^))' ^ Ee [(/'(Xi, e)/f{X,, 6))'] = -i{6) 

1=1 

for large n's, where i{9) is the one-step Fisher information. So, in this case, 
one can use the recursioE0 



(1.3) k = 9n-i + 1 n>l, 

to construct an estimator which is "asymptotically equivalent" to the MLE. 
Motivated by the above argument, we consider a class of estimators 

(1.4) On = k-1 + T-\k-i)i^n{k-i), n > 1, 

where ipn is a suitably chosen vector process, r„ is a (possibly random) 
normalizing matrix process and 9^ G M™ is some initial value. If the condi- 
tional probability density function (or the probability function) of the ob- 
servation X„, given Xi, . . . , X„_i, is /„(6', x|x""^) = /„(x, 9\xi, x„_i), 
then one can obtain a ML (maximum likelihood) type recursive estimator 
on choosing Vn(^) = f^i9, Xn\X^'^)/ fni9, Xn\X^'^) (the dot denotes the 
row- vector of partial derivatives w.r.t. 9 G M™ and T is the transposition). 

Note that while the main goal is to study recursive procedures with non- 
linear ipn functions, it is worth mentioning that any linear estimator can be 
written in the form (11.41) with linear, w.r.t. 9, ipn functions. Indeed, if 9n = 
F^^ J2k=i hk{Xk), where F^ and hk{Xk) are matrix and vector processes of 
suitable dimensions, then (see Section 4.2 for details) 

9n = 9n-l + F^"*^ {hn{Xn) — (F„ — Vn~\)9n-\ 

which is obviously of the form (11.41) with '?/'„(6') = /i„(X„) — (F„ — F„_i)6'. 

Note also that in the iid case, (11. 3p can be regarded as a stochastic iter- 
ative scheme, i.e., a classical stochastic approximation procedure, to detect 



"'^This procedure should not be confused with the Newton-Raphson iterative method. 
See the corresponding discussion in the Introduction of Sharia (2006a). 
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the root of an unknown function when the latter can only be observed with 
random errors (see Remark 3.1 in Sharia (2006a)). A theoretical implication 
of this is that by studying the procedures (11.31) . or in general (11.41) . we study 
asymptotic behaviour of the estimator of the unknown parameter. As far 
as applications are concerned, there are several advantages in using (II. 4p . 
Firstly, these procedures are easy to use since each successive estimator is 
obtained from the previous one by a simple adjustment and without storing 
all the data unnecessarily. This is especially convenient when the data come 
sequentially. Another potential benefit of using (11.41) is that it allows one 
to monitor and detect certain changes in probabilistic characteristics of the 
underlying process such as change of the value of the unknown parameter. 
So, there may be a benefit in using these procedures in linear cases as well. 

In i.i.d. models, estimating procedures similar to (II. 4p have been studied 
by a number of authors using methods of stochastic approximation theory 
(see, e.g., Khas'minskii and Nevelson (1972), Fabian (1978), Ljung and 
Soderstrom (1987), Ljung et al (1992), and references therein). Some work 
has been done for non i.i.d. models as well. In particular, Englund et 
al (1989) give an asymptotic representation results for certain type of X„ 
processes. In Sharia (1998), theoretical results on convergence, rate of con- 
vergence and the asymptotic representation are given under certain regular- 
ity and ergodicity assumptions on the model, in the one- dimensional case 
with ijjn{x,6) = ^\ogfn{x,6) (see also Campbell (1982), Sharia (1992), and 
Lazrieva et al (1997)). 

We study multidimensional estimation procedures of type (II. 4p for the 
general statistical model. In Sharia (2006a), imposing "global" restrictions 
on the processes ip and F, we study "global" convergence of the recursive 
estimators, that is the convergence for an arbitrary starting value ^o- In 
Sharia (2006b), we present results on the rate of the convergence. In this 
paper we are concerned with asymptotic behaviour of the estimators defined 
by (ll.4p . Since the model considered is very general, the main objective is 
to prove that 6'„ is locally asymptotically linear, that is, for each 6 there 
exist a matrix process Gn{0) such that 

n 

9^ - e = G7,\e)Y,Md) + el 

1=1 

where GI/'^{6)6^ — in probability (see Section 2 for a more general 
definition). 

Since ipti^^) is typically a martingale- difference, asymptotic distribution 
of an asymptotically linear estimator can be studied using a suitable form 
of the central limit theorem for martingales (see e.g., Feigin (1985), Hutton 
and Nelson (1986), Jacod and Shiryayev (1987). Detailed discussion of the 
literature on this subject can be found in Barndorff-Nielsen and Sorensen 
(1994), Heyde (1997) and Prakasa-Rao (1999)). For example, results in 
Shiryayev (1984) (see, e.g., Ch.VII, §8, Theorem 4) show that under certain 
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conditions, local asymptotic linearity implies asymptotic normality. In the 
standard case of i.i.d. observations, assuming that 

has zero mean and a finite second moment and Gn{6) = n'~f{6), for some 
non-random invertible ^{0), it follows that 

c _ 9) I p^) (0, ^-^(^e)Meh-\e)) 

where 

Me) = J ^ij{e,x)^^ie,x)f{e,x)fiidx) < oo. 

In particular, in the case of likelihood recursion with 

i;ie,x) = fie,x)/fie,x), 

if ^{6) is the one-step Fisher information, that is, 

it follows that On is asymptotically normal with parameters {0,i~^{6)), i.e. 
C {n'/\9n - 9) \ P') ^Af {0, ^-\9)), 

meaning that 9n is asymptotically efficient. In general, in the case of one 
dimensional parameter 9, an estimator is said to be asymptotically efficient 
if it is asymptotically linear with 

M^) = fn{9,Xn\Xr')/fn{9,X^\Xr') and 0^(9) = 1^(9). 

where In{9) is the conditional Fisher information. This kind of efficiency is 
called asymptotic first order efficiency. The motivation behind this general 
definition is the same as in the classical scheme of i.i.d. observations. For a 
detailed discussion of this notion see, e.g.. Hall and Heyde (1980), Section 
6.2. Under relatively mild conditions, asymptotically efficient estimators 
are asymptotically equivalent to the MLE T„, i.e. 

I'j'i9)i9n-Tn)^0 

in probability (see, e.g.. Hall and Heyde (1980), Section 6.2, Theorem 6.2.). 
For the generalisation of these concepts see Heyde (1997). 

It is worth mentioning that the global convergence results for (11. 4p were 
obtained in Sharia (2006a) under conditions that allow r.„ to belong to quite 
a wide class of processes which does not directly depend on the choice of 
ipnS (see Remark 3.1 below). In order to study the rate of convergence, 
one has to restrict the class of allowed r„'s (see Sharia (2006b)). It turns 
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out that when deahng with local asymptotic linearity, one has to restrict 
this class even further - to an explicit choice of r„, depending on the choice 
of ipn (see Remark 3.2(iv)-(vii) below). In other words, the results of the 
paper tell one how to construct a locally asymptotically linear procedure 
(11.41) with given ipnS. The fact that one is restricted to this choice of Tt is 
probably not very surprising in retrospective, but this issue does not seem 
to have been discussed in the existing literature. 

An estimator defined by (11.41) is a recursive analogue of the correspond- 
ing M-estimator defined as a solution of the estimating equation (11. 2p . It 
should also be noted that the recursive procedure (ll.4p is not a numerical 
solution of (11.21) . Nevertheless, under quite mild conditions, the recursive es- 
timator and the corresponding M-estimator are expected to have the same 
(or equivalent) asymptotic linearity expansions. It therefore follows that 
they are asymptotically equivalent, in the sense that, depending on the reg- 
ularity and ergodicity properties of the underlying model, they both have 
the same asymptotic distribution. 

The paper is organized as follows. Section 2 introduces the main objects 
and definitions. The main results are obtained in Section 3 with various 
comments and explanations of the conditions used there. In Section 4 we 
give examples to illustrate the results of the paper. 

2 Basic model 

Let Xt, t = 1, 2, . . . , be observations taking values in a measurable space 
(X, B(X.)) equipped with a a- finite measure /i. Suppose that the distribution 
of the process Xt depends on an unknown parameter 6 E Q, where is an 
open subset of the m-dimensional Euclidean space M"^. Suppose also that 
for each t = 1,2, . . . , there exists a regular conditional probability density 
of Xt given values of past observations of Xt^i, . . . ,X2,Xi, which will be 
denoted by 

ft{9, Xt I x{~'^) = ft{9, Xt I xt-i, ■■■,xi), 

where fi{9,Xi \ x^) = fi{0,xi) is the probability density of the random 
variable Xi. Without loss of generality we assume that all random variables 
are defined on a probability space {yt,T) and denote by {P^, 6 G 0} the 
family of the corresponding distributions on (fi,jF). 

Let J^t = o'{Xi, . . . , Xt) be the a- field generated by the random variables 
Xi, . . . , Xt. By (M™, B{MJ^)) we denote the m-dimensional Euclidean space 
with the Borel cr-algebra B{W^). Transposition of matrices and vectors is 
denoted by T. By {u, v) we denote the standard scalar product oiu,v e M™, 
that is, {u,v) = u^v, and the corresponding norm is denoted by 

Suppose that h is a. real valued function defined on C M™. We denote 
by h{6) the row-vector of partial derivatives of h{6) with respect to the 



6 



components of ^, that is, 

The m X m identity matrix is denoted by 1. 

If for each t — 1,2, ... , the derivative ft{0, Xt \ x*{'^) w.r.t. 9 exists, then 
we can define 

and the process 

k{9) = k{9,Xt\Xl-') 
(with the convention 0/0 = 0). Let us denote 

it{9 I x\-') = J k{9,z I x\-')lj{9,z I x\-')M9,z \ x\-')pi{dz). 

The one step conditional Fisher information matrix for t = 1,2,... is de- 
fined as 

it{9)^it{9\Xl-'). 

Note that the process it{9) is "predictable", that is, the random variable 
it{9), is J-'t-i measurable for each t > 1. Note also that by definition, it{9) 
is a version of the conditional expectation w.r.t. J^t-i, that is, 

it{9)^Ee{lt{9)lJ{9)\Tt.i}. 

Everywhere in the present work conditional expectations are meant to be 
calculated as integrals w.r.t. the conditional probability densities. 
The conditional Fisher information at time t is 

t 

it{0) = Y,is{e), t = i,2,.... 

s=l 

We say that tfj = {ipt{(^, ^t, ^t-i, ■ ■ ■ , Xi)}t>i is a sequence of estimating 
functions and write ip & ii for each t > 1, ipt{9,Xt,Xt-i, . . . ,xi) : © x 
X* — >■ R™ is a Borel function. 

Let ip e and denote iljti9) = iljt{9, Xt, Xt-i, . . . ,Xi). We write G 
if ipt{9) is a martingale-difference process for each 9 E Q, i.e., if 
Ee {'4't{(^) I ^t-i} = for each t = 1,2,... (we assume that the conditional 
expectations above are well-defined and To is the trivial cr-algebra) . 

Note that if differentiation of the equation 1 = f ft{9,z \ x\~^)ijL{dz) is 
allowed under the integral sign, then {lt{9)}t>i G 

Suppose that ip E and r((^^) is a predictable m x m matrix process 
(i.e. a matrix with predictable components r^"'(6') ) with detrt(6') ^ 0. We 
say that an estimator 9t is locally asymptotically linear if for each 9 & Q, 

t 

(2.1) 9t^9 + r;\9)J2M0)+st, 

s=l 
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and At{9)e^ ^ in probability Pq, where At{9) is a sequence of m x m 
matrices such that At{9) — > oo in probabihty P^, and At{9)T:i~^{9)At{9) —> 
ri{9) weakly w.r.t. P^ for some random matrix ri{9). That is, 9t is locally 
asymptotically linear if 

(2.2) At{9)i9: - 9t) ^ 
in probability P^, where 

t 

(2.3) 9: = 9 + r;\9)J2M0), 

s=l 

is a linear statistic. 

Convention Everywhere in the present work 9 G M."^ is an arbitrary 
but fixed value of the parameter. Convergence and all relations between 
random variables are meant with probability one w.r.t. the measure P^ 
unless specified otherwise. A sequence of random variables {^t)t>i has some 
property eventually if for every u in a set of P^ probability 1, C,t has this 
property for all t greater than some to{uj) < oo. 

3 Main results 

Suppose that ip & and Tt{9), for each 9 G M™', is a predictable m x m 
matrix process with det rt{9) ^ 0, t > 1. Consider the estimator 9t defined 

by 

(3.1) 9t = 9t-i + T-\9t-i)Mdt-i), t > 1, 

where 9q G is an arbitrary initial point. 

Let 9 G M*" be an arbitrary but fixed value of the parameter and for any 
M G M"" define 

Rti9, u) = Tti9)T;\9 + u)Ee {^9 + u) \ J^t-i} ■ 

Denote At = 9t — 9. Then (13. ip can be rewritten as 

(3.2) At = At-i + T;\9)Rt{9, At.,) + T;\9)eet, 
where 

Set = rti9)T;\9 + At.i)MO + ^t-i) - Rt{9, At-i) 
is a P^-martingale difference. 
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Let Ag = and for t > 1 denote A* = 9^—9 where 9* is defined by 
flO) . Then, 

t t-i 

a: - a:_, = r;\9)Y,Md) - r;_\{9)Y,M0) 

s=l s=l 

(3.3) = {t;\9) - t;_\{9)) J2M0) + r;\9)M0) 

s=l 

= t;\9) (r,_i(0) - T,i9)) AU + r;\9)Md)- 

It therefore follows that A^ satisfies the recursive relation given by 

(3.4) a: = AU - T;\9)ATt{9)AU + T;\9)e;,, t > 1, 

where ATt{9) = Tt{9) — Tt-i{9) and e^^ = ipt{9). By comparing equa- 
tions (13. 2p and (13. 4p . one can obtain the following result on the asymptotic 
relationship between 9t and 91- 

Lemma 3.1 Suppose that ■?/' G ^' and there exists a sequence of invertible 
random matrices At{9) such that A'j~^{9) — > m probability and 

(E) 

weakly w.r.t. P^, where ri{9) is a random matrix with ri{9) < oo P^- 
a.s.; 

(1) 

t 

lim A-\9) V (Ar,(e)A,_i + Rs{9, A,_i)) = 

s=l 

in probability P^ ; 

(2) 

f 

hm A;\9)J2U0) = 

s=l 

in probability P^ , where 

Ss{9) = T s{9)T;\9+ As-i) {M0 + As-i) - Ee {^9 + A,_i) | ^,_i})}-^,(0). 

Then At{9){9l — 9t) in probability P^ (i.e., §1 is locally asymptotically 
linear). 

Proof. To simplify notation we drop the fixed argument or the index 9 in 
some of the expressions below. Denote 5t := 9t — 91 = At — A^. Subtraction 
(13.41) from (13. 2p yields the recursive relation 

(3.5) 6t= {l-T;'ATt)6t-i + T-\et~el) + T-\ATtAt-, + Rt{9,At-i)). 
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Denote Ht := ELi (Ar,(^)A,_i + A,_i)) and Mt := ELi [^s-el]. 
Then the expression 

6t = rT'{Mt + nt + 6o}, t>i 

can easily be obtained by inspecting the difference between t'th and (t— l)'th 
term of this sequence (exactly in the same way as in (13 .31) ). to check that 
(1331) holds. 

Now, (1) implies that AfT^Ht in probability P^. Also, by (2), 
At^Mt = A^\9) J2l=i £siO) ^ in probability So, using (E), it follows 
that At6t in probability P^. 

Next result gives sufficient conditions for (1) and (2). 
Proposition 3.1 

(a) Suppose that At{6) in Lemma 3.1 are diagonal matrices with non- 
decreasing (w.r.t. t) elements and 

(LI) 

f 

A;\9) J2 A,i9)[ATs{9)A,^, + R,{9, A,_i)] ^ 

s=l 

in probability P^ ; 
Then (1) holds. 

(b) Suppose that At{9) in Lemma 3.1 are diagonal non-random matrices, 
i/j e and 

(L2) 

in probability P^ , where a["\9) is the j-th diagonal element of the 
matrix At{9) and8s^\9) is the j-th component ofSs{9) which is defined 
in (2). 

Then (2) holds. 

(c) Suppose that At{9) in Lemma 3.1 are diagonal with non- decreasing ele- 
ments A^^'^\9) oo, ip e and 

(LL2) 



E 



{A9'\9)y 



< oo 



s=l 

P^-a.s., where si^\9) is the j-th component of Ss{9) which is defined 
in (2). 
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Then (2) holds. 

Proof. See Appendix A. 

Remark 3.1 

Before analyzing the above results, let us understand how the procedure 
works. Consider the maximum likelihood recursive procedure in the one- 
dimensional case 

where k^O) = ff{e,Xt \ X{-'^) / ft{e, Xt \ X^^) and It{e) is the conditional 
Fisher information. 

Denote At = 6t — 6 and rewrite the above recursion as 

At = Ai_i + i-\e + At^Me + Ai„i). 

Then, 

Ee [et - Ot^i I Tt-i] = Ee {A, - A^^i | Tt-i} = K\9 + At^,)ht{e, A,_i), 
where 

ht{e,u) = Ee{lt{e + u) \J't-i}. 

Under usual regularity conditions (see Sharia (2006a) Remark 3.2 for de- 
tails), fet(6',0) = and ^bt{9,u) \u=o= -h{0) < 0, implying that 

(3.6) ubt{9,u)<Q 

for small values of m 7^ 0. Now, assuming that (13.61) holds for all u ^ 0, 
suppose that at time t — 1, 9t-i < 9, that is, Af-i < 0. Then, by (13. 6p . 

Ee — Ot-i I > 0. So, the next step 9t will be in the direction of 9. 

If at time t — 1, 9t-i > 9, by the same reason. Eg — 9t-i \ J-'t-ij < 0. 
So, on average, at each step the procedure moves towards 9. However, the 
magnitude of the jumps 9t — 9t-i should decrease, for otherwise, 9t may 
oscillate around 9 without approaching it. On the other hand, care should 
be taken to ensure that the jumps do not decrease too rapidly to avoid 
failure of 9t to reach 9. 

These issues are addressed in Sharia (2006a) and the conditions are intro- 
duced to ensure global convergence of (13. ip . that is, convergence for any 
arbitrary starting value. These conditions are flexible enough to allow for a 
quite wide choice of the normalising sequence F for any particular ip. 

Remark 3.2 

(i) As was mentioned above, strong consistency of the recursive estimator 
9t, that is the convergence A^ = — ^ (P^-a.s.) is established in Sharia 
(2006a). Here we are interested in the asymptotic behaviour of the recursive 
estimator given that it is consistent. Note that although consistency is not 
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formally required in Lemma 3.1, it is easy to see that if Ot is not consistent, 
conditions (1) and (2) will be satisfied for very special cases only. Note also 
that given that = — 6* — > 0, conditions (1) and (2) are local in the 
sense that they are determined by local (w.r.t. the parameter) behaviour 
of the functions involved. 

(ii) Condition (E) is an ergodicity type assumption on the statistical model. 
If Tt{9) — It{9) (the conditional Fisher information) and At{9) and r]{9) are 
non-random, then the model is called ergodic. Further discussion of this 
concept and related work appears in Basawa and Scott (1983), Hall and 
Heyde (1980) § 6.2, and Barndorff-Nielsen and Sorensen (1994). 

(iii) Let us examine condition (2) in Lemma 3.1. Given that At = 9t — 9 ^ 
0, if the functions ipt{(^) and Ti,{9) are continuous w.r.t. 9 (with certain 
uniformity w.r.t. t), we expect St{9) — > 0. Parts (b) and (c) in Proposition 
3.1 give sufficient conditions for (2). If there exists a non-random sequence 
At{9), then obviously (L2) is less restrictive then (LL2). But unfortunately, 
(L2) can only be used for non-random At{9). In the case of random At{9), 
when (LL2) may be used, just the convergence Eq {(£^(6'))^ | J-'t-i] — >■ 
may not be enough since in many models the components of At{9) have 
the rate v^. In such cases one may also use the result on the rate of 
convergence of 9t presented in Sharia (2006b) (see examples 4.1 and 4.3 in 
the next section). 

(iv) Condition (1) gives an important clue for an optimal choice of the 
normalizing sequence Tt{9). To see this, let us assume that ip G so that 
Rt{9, 0) = and have a look at (1) and (LI) in the case of one dimensional 
parameter ^ e R. Now we can write 

Ar,(^)A,_, + Me, - (Ar,(^) + Mi^!^^.z3m^ a,_. 

In most applications, the rate of At is and the best one can hope for 
is that y/iAt is stochastically bounded. Therefore we must at least have 
the convergence Art(^) + {Rt{9,At-i)) - Rt{9 , 0)) / At-i 0. Given that 
At_i — > we expect Ar((^) —d/du Rt{9,u) |u=o for large Vs. Also, since 
Rt{9, 0) = Ee {it{0) I J^t-i} = 0, if Vt{9)/Vt{9 + u) is smooth in = 0, we 
can write that d/du Rt{9,u) \u=g— d/du Eq {'^t{9 -\- u) \ J^t-i} \u=o ■ So, 
denoting 

bt{9,u)=Eg{i;t{9 + u)\J't-i} 

we expect 

(3.7) Ar,(9) ^ -b',{9,0), 

where 

b'M0) = ^bt{9,u) \u=o. 
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Using the similar arguments, for the multidimensional case, we expect (13.71) 
to hold for large t's, where b'f-{6, 0) is the total differential of bt{6, u) inu = 0. 
Therefore, 



t 

(3.8) r,{e) = -Y,K{o,o) 

s=l 

is an obvious candidate for the normalizing sequence. If ipti^^) is differen- 
tiable in 9 and differentiation of bt{0,u) = E0{ipt{6 + u) \ Tt-i} is allowed 
under the integral sign, then h[{6,Q) = Eg{ipt{9) \ J-'t-i}. This implies that, 
for a given sequence of estimating functions ipt{d)i another possible choice 
of the normalizing sequence is 

t 

(3.9) Vt{e) = -J2 MMO) I -^.-i}, 

s=l 

or any sequence with the increments 

Also, if the differentiation w.r.t. 6 of 

= J M9,z I Xi~')M9,z I Xi-')fi{dz) 
is allowed under the integral sign, then by the product rule, 

= j MG,z I x{-')Me,z\ xl'')fi{dz)+ j Me^z \ xl-^)h{9,z\ x{-^)^l{dz). 

So, 

EeiMO) I Tt-i} = j z I X{-^)ft{9, z I X\-^)^l{dz) 

= - j Mo,z\ x{-^)Ue,z I x{-^)^i{dz) 

(3.10) =- j Mo,z\ xi-')iJie,z I xi-')ftie,z \ x{-')fi{dz) 
= -Eg{Me)if{d) I 

where, as before, kiO) = f^iO, XtlX^-^)/ ft{e, XtlXl'^). Therefore, denoting 

7f (^) = Ee{MO)lIiO) I J't-i}, 
another possible choice of the normalizing sequence is 

(3.11) r,{e) = j2iti0), 

s=l 
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or any sequence with the increments 



Since typically, for each 9, the process 

t 

s=l 

is a - martingale, (13.111) can be rewritten as 

Vm = {M\u')t 
where = ^^^^ 1,(9) is the score martingale. 

(v) Part (iv) above highlights a very important point. Suppose we wish 
to construct a recursive estimator with a given sequence of estimating 
functions. In order to achieve consistency, we are quite flexible in choice of 
the normalizing sequence F; the recursive procedure will converge even when 
r sequence is not related to ip (see Sharia (2006a)). (Of course, the rate of 
the normalizing sequence still has to be "right" but is mostly determined 
by the model.) If we want to obtain a recursive estimator which is also 
asymptotically linear, then the normalizing sequence F has to be (13.81) (or 
(13.91) . ( 13. lip , or a sequence asymptotically equivalent to ( 13. 8p ). 

(vi) Let us consider a likelihood case, that is ipt{9) = lt{9). Since 7^(6') = 
Zj(6'), the process (13. lip in this case is the conditional Fisher information 
^tid) = Y^s=i^s{d)- So, the corresponding recursive procedure is 

(3.12) 9t = 9t-i + l;\9t-i)lt{9t-i), t > 1, 

Also, given that the model possesses certain ergodicity properties, asymp- 
totic linearity of (13.120 implies asymptotic efficiency. In particular, in the 
case of i.i.d. observations, it follows that the above recursive procedure 
is asymptotically normal with parameters (0, i^^{9)) (see Corollary 4.1 in 
Section 4). 

(vii) Normalizing sequences suggested in (iv) have been derived from the 
asymptotic considerations. In practice however, behaviour of F sequence 
for the first several steps might also be important. This can happen when 
the number of observations is small or even moderately large. According 
to (iv), to achieve asymptotic linearity, one has to choose a normalizing 
sequence F with the property that 

AT,{9)^-b[{9,0) 

for large t's. So, we can consider any sequence of the form C + qF^, where 
Ft is one of the sequences introduced above (by ( 13. Sp . ( 13. 9p . or ( 13. lip ), q 
is a sequence of non-negative r.v.'s such that Q = 1 eventually and C is a 
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suitably chosen constant. In practice, Cf and C can be treated as tuning 
constants to control behaviour of the procedure for the first several steps 
(see Sharia (2006a), Remark 4.4). Under certain assumptions, at each step, 
the recursive procedure (13. ip . (on average) moves towards the direction of 
the unknown parameter (see Remark 3.1 or Sharia (2006a), Remark 3.2 
for details). Nevertheless, if the values of the normalizing sequence are too 
small for the first several steps, then the procedure will oscillate excessively 
around the true value of the parameter. On the other hand, too large 
values of the normalizing sequence will result in slower convergence of the 
procedure. A good balance can be achieved by using the tuning constants. 
The detailed discussion of these and related topics will appear elsewhere, 
but as a rough guide, the graph of 6t against t should ideally have a shape 
of those in Figure 1 in Sharia (2006a) (that is, a reasonable oscillation at 
the beginning of the procedure before settling down at a particular level). 

4 SPECIAL MODELS AND EXAMPLES 

4.1. The i.i.d. scheme. Consider the classical scheme of i.i.d. ob- 
servations Xi,X2,..., with a common probability density/mass function 
f{9,x), 9 G M"*. Suppose that ip{9,x) is an estimating function with 

Ee{ij{9,Xi)) = [ ^{9,z)f{9,z)fi{dz) = 0. 



Let us define the recursive estimator 9t by 

(4.1) 9t = 9t-i + ^7"'(^Vi)^(^t-i,X,), t > 1, 

where ^ 1^™" is any initial value. According to Remark 3.2 (iv) and the 
condition (V) below, an optimal choice of 'y{9) would be either 

^i9) = Eeiiji9,X,)) 
-f{9) = Eg{iP{9,Xi)F{9,Xi)) where l{9,x) 



or 



or any non-random invertible matrix function that satisfies conditions listed 
below. 

Suppose that 



j49) = I m^)V{0,z)fi9,z)fx{dz) < 



oo 



and consider the following conditions. 
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(I) For any < e < 1, 

sup 'y^^{6 + u) / ip{6 + u, x)f{9, x)fj,{dx) < 0. 

e<\\u\\<^ J 

(II) For each m G M'", 

J \\^-\e + u)ij{e + u,x)\\^ fie,x)fx{dx) < Keil + \\uf) 

for some constant Kq. 

(III) 7(6*) is continuous in 9. 
(IV) 

lim / \\ij{9 + u,x)-ij{9,x)ff{9,x)fx{dx)=0. 

(V) 

where a^{u) = o(||'u||"'^^^) as m — for some e > 0. 

Corollary 4.1 Suppose that for any 9 G conditions (I) - (V) are sat- 
isfied. Then the estimator 9t is strongly consistent and t^{9t — 9) {P^- 
a.s.) for any < 5 < 1/2 and any initial value 9q. Furthermore, 9t is 
asymptotically normal with parameters (0, 'y~^{9)j{9,0)'y'^{9)) , that is, 

C (t''\9t - 9) I P') (0, r\9)3^{9)r\9)) . 

In particular, in the case of the maximum likelihood type recursive proce- 
dure with ip{9,x) = f"^ {9,x)/ f{9, z) and 'y{9) = i{9) = ji{9), the estimator 
9t is asymptotically efficient (i.e., asymptotically normal with parameters 
(0, ^-\9))). 

Proof See Appendix A. 

Similar results (for i.i.d. schemes) were obtained by Khas'minskii and 
Nevelson (1972) (when ^{9,x) = l{9,x) and -f{9) = i{9), Ch.8, §4) and 
Fabian (1978). 

4.2. Linear procedures. Consider the recursive procedure 
(4.2) 9t = 9t-i + (ht - ^t9n-i) , t > 1, 

where the F^ and 74 are predictable matrix processes, ht is an adapted 
process (i.e., ht is jF^-measurable for t > 1) and all three are independent of 
9. The following result gives a sets of sufficient conditions for the asymptotic 
linearity of the estimator defined by (14. 2 p in the case when the linear ipt{9) = 
ht — 7t^ is a martingale-difference, i.e.. Eg {ht \ J^t-i} = 7t^, for t > 1. 
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Corollary 4.2 Suppose that oo and 

t 

(4.3) r7^/'5^(Ar,-7s)A,_i^o 

in probability , where A^^i = O^^i — 9. Then the recursive estimator 
defined by f l4.2l) is asymptotically linear with 

t 

(4.4) Ty\9, -9)= r,-^/^ ^^,(0) + op«(l), 

s=l 

where Ope(l) — > m probability Pg. 

1 /2 

Proof Let us check the conditions of Lemma 3.1 for At{9) = F/ . Condition 
(E) trivially holds. Then, since ipt{(^) = ht — 7(6' and 

bt{9, u) = Ee {{M^ + u)) I J^t-i} = Ee {{ht - 7^(0 + u)) \ J^t-i} = -ItU, 

we have 

Rt{9,u)=Vt{9)V-\9 + u)h{9,u) = -^^u. 

Therefore, (1) is equivalent to (14.31) . Then, it is easy to see that for Esiff) 
defined in (2) we have 

Es{9) = tlj,{9 + A,_i) - bs{9, A,_i) - ^s{9) = 

implying that (2) holds which completes the proof. <C> 



Remark 4.1 Condition (14. 3 p trivially holds if AF^ = jt, that is Tt = 
Sl=i 7s- this case, the solution of (14.21) is 

(4.5) 9, = Tt' (^9o + J2K{Xs)^ . 

This can be easily seen by inspecting the difference 9t — 9t^i for the sequence 
(14.51) (exactly in the same way as in (13.31) ). to check that (14. 2 p holds. Also, 
since (14. 5 p can obviously be rewritten as 

t 

9t = T-'9o + F^i (^^(^^) - 7s^) + ^> 

it follows that in this case, F^ — » oo is indeed an obvious necessary and 
sufficient condition for 9t to be asymptotically linear (for arbitrary starting 
value 6^0). 
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4.3. Exponential family of Markov processes Consider a conditional 
exponential family of Markov processes in the sense of Feigin (1981) (see 
also Barndorf-Nielson (1988)). This is a time homogeneous Markov chain 
with the one-step transition density 

f{y; e, x) = h{x, y) exp {9'^m{y, x) - /3(6'; x)) , 

where m{y,x) is a m-dimensional vector and (3{6\x) is one dimensional. 
Then in our notation ft{9) = f\Xt] 9,Xt-i) and 

m= (J^logm^ =m(X„X,_i)-/3^(e;X,_i). 

It follows from standard exponential family theory (see, e.g., Feigin (1981)) 
that lt{6) is a martingale-difference and the conditional Fisher information 
is 



s=l 

A maximum likelihood type recursive procedure can be defined as 



-1 



Ot = 9t^i+\^^m_,-X,_^)j (m(X„Xi_i) -/3^(^i_i;X,_i)) , t > 1. 

Now suppose that 9 is one dimensional and the process belongs to the 
conditionally additive exponential family, that is, 

f{y; 6, x) = h{x, y) exp {6Tn{y, x) - /3{e; x)) , 

with 

(4.6) (3{e;x) = -f{e)h{x) 

where h{-) > and 7(-) > (see Feigin (1981)). Then, 

t 

It{9)=^{9)Ht where if^ = /i(X,_i). 

s=l 

Assuming that ^{6) ^ 0, the likelihood recursive procedure is 

(4.7) = e,^i + --^77 {^{X,, Xi_i) - 7(^t-i)/^(X,_i)) • 

Remark 4.2 Consistency and rate of convergence of the estimator derived 
by (14.71) is studied In Sharia (2006b). To ensure that (14.71) has the same 
asymptotic properties as the maximum likelihood estimator, one has to im- 
pose certain restrictions on the 'j{6) and Ht. In Corollary Al in Appendix 
A, the conditions of Section 3 written in terms of this model are presented. 
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These conditions will be satisfied if there is a certain balance between re- 
quirements of smoothness on 7(-), the rate at which Ht oo, and ergodicity 
of the model. For instance, suppose that the model is ergodic, that is, there 
exists a non-random sequence Ht such that Ht/ Ht ^ rj < oo weakly. Then 



will hold if the process 



m 
) 



converges to zero (criterion based on the Lenglart-Rebolledo inequality, see 
(L2) and formula (A5) in Appendix A). So, assuming that the estimator 
is consistent (that is Aj — > 0), by the Toephts lemma, the above will be 
guaranteed by the continuity of %{■)■ On the other hand, if the model 
is non-ergodic, then one may need to impose smoothness of higher order 
on 7(-) function (see condition (iii) below) and restrictions on the growth 
of the sequence Ht (see condition (i) below). The following result gives 
one possible set of sufficient conditions for the recursive estimator to be 
consistent and to have the same asymptotic properties as the maximum 
likelihood estimator. 



Proposition 4.3 Suppose that Ht ^ oo and 
(i) 

hjXt) 
Ht ' 

(ii) there exists a constant B such that 

l + 7'(«) 



7^(«) 

for each u 



(iii) The function ^{•) is locally Lipschitz , that is, for any 6 there exists a 
constant Kq and < ee < 1/2 such that 



\^i9 + u)-^i9)\<Ke\u\ 

for small u 's. 
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Then 6t defined by (14.71) is strongly consistent (i.e., 6t ^ 6 P^-a.s.) for any 
initial value 9q. Furthermore, Hf{9t — 9) ^ P^-a.s. for any 6 g]0, l/2[, 
and 9t is asymptotically linear with 

t 

(4.8) Hl'\9t -9) = H;^'^ Y1 (^(^- ^-i) - WHXs^^)) + ope{l), 

s=l 

where Ope(l) ^ in probability Pg. 

4.4. AR(m) process Consider an AR(m) process 

Xi = 6'iXj_i + ■ ■ ■ + 9mXi_m + = 9'^Xl_^ + ^j, 

where X^Z^ = i^i-i^ • • • ? -^j-m)^, 9 = {9i, . . . , 9m)'^ and is a sequence of 
i.i.d. random variables. 

In Sharia (2006a) we discuss convergence of the recursive estimators of 
the form 



(4.9) 9t = 9t-i + T-\9t^i)i^t{Xt - 9l,X, 



t-l ^ 

t—m)i 



where il>t{z) and V^^{z) (z G M™) are respectively suitably chosen vector and 
matrix processes. If the probability density function of C,t w.r.t. Lebesgue's 
measure is g{x) then the conditional probability density function of Xf given 
values of past observations of X^Z^ = i^t-i, . . . , Xt-m) is obviously 

MO,xt I xlzl) = g{xt - 9^x\z\), 

and so, 



^t—m' 



flit 


9,Xt 


1 




- o^xt 


ft{6 








- 9^X1 



It follows from the results of Section 3 (see Remark 3.2 (vi)) that an optimal 
choice of the normalizing sequence is the conditional Fisher information 
It{9), (or any sequence with the increments equal to AIt{9)). It is easy to 
see that in this case, 



hi9)=h=z^J2^t-L{Xtz'„ 

where 



t 

T 

mJ 

s=l 



Since in this case the conditional Fisher information can also be found 
recursively, a likelihood recursive procedure is 



(4.10) 9, = - j-i 9'iXt-9,_,X,_,] ^^ 



h = It-l+^'Xlzl{X'~''^ 



giXt-9t^^Xt^^: 

t~m) 1 



t—m 
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for t > 1 and an arbitrary starting point ^o- The strong consistency of 
the estimators (14.91) and, in particular, that of (14.101) is studied in Sharia 
(2006a). 

The class of estimators (14. 9 p includes recursive versions of robust modifi- 
cations of the least squares method. These are recursive estimators defined 
by 

(4.11) et = ^t-i + r,7(x*i^)0(Xt - el.xlz'j, 

where is a bounded scalar function and 7(m) is a vector function of the 
form uh{u) for some non-negative function h of u. 

Since ( imj) is of the form dSH) with ^t{0) = l{Xlzl,)(p {Xt - e^X^Z^) , 
assuming that 0(-) is different iable (almost everywhere w.r.t. Lebesgue's 
measure) we obtain 

Ee {MO) I J's-i} = -i{Xlzl){XlzlY Ee {0' (X, - ^X*"!,) | 

= -i{xlzl) {xlzir J<P'{x~ e^xtzL) 9{x - e^xlz'Jdx, 

= -^{Xlzl){Xlzlf j ct>'{x)g{x)dx. 

So, according to Lemma 3.1 (see Remark 3.2 (iv) formula (13.91) ). an optimal 
normalizing sequence for (14.111) is 

t 

(4.12) r,(^^) = c,Y, i{x:zl)x:z'J 

s=l 

where 

Cg = J (j)'{x)g{x)dx 

or a sequence with the increments equal to Cg'-){Xlzln)Xlzln^ ■ 

Consider for instance a recursive M-estimator of the parameter of an 
AR(1) process defined as 

(4.13) = 9t_, + ls,0, (^] sA, 

where and Sr are scale estimates and 0c is the Huber function, 

, / \ fx, if Ixl < c 

[ c sign(xj it \x\ > c 

and c > is a tuning constant. This is a recursive version of a robust 
generalized M-estimator of the parameter of an AR(1) process proposed by 
see Denby and Martin (1979). 
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Another example is 

(4.14) Ct = Ct-l + ^S^(f)a,p ( — ^ ) -Sr^a,/? (— '±:1^±±\ 

where 0q,^/3 is Hampel's two-part redescending function 



(4.15) K^fiix) 



X, if |x| < a 

a(/5 — x)/(/9 — a), \i a < x < fi 

— a(/3 + x)/(/? — a), if — /? < X < — a 

0, if |x| > /3, 



with tuning constants < a < /3. 
For the procedure (14.131) . 

Cg = y (f)'{x)g{x)dx = j Sr ^^0c g{x)dx = j ^'c (^j- ] g{x)dx, 

and so 

/CSr 
g{x)dx 
■CSr 

Similarly, for KT^ . 



(4.17) = / g{x)dx — — ( / g{x)dx + / g{x)dx 



/3Sr J OlSr 



Below we present a brief simulation study. The time series were gener- 
ated from the additive effect outliers (AO) model: 

Yt = OYt^i + wt 
Xt = Yt + vt, 

where innovations Wt are i.i.d. Gaussian A^(0, 1). The variables Vt are also 
i.i.d. with distribution (1 — e)6o + eN{0,a'^), where 6o is the distribution 
that assigns probability 1 to the origin. Therefore, with probability 1 — e 
the AR{1) process Yt is observed, and with probability e the observation 
is the AR{1) process Yt plus the error with Gaussian distribution iV(0, a^). 
In this simulation, 6 = 0.6, e = 0.05 and = 9. The figures below show 
the performances of the estimator 6t defined by (14.131) . the estimator (t 
defined by (I4.14p and the least squares estimator (which is equivalent 
to the recursive procedure defined by fl4.10p with g{x)/g{x) = —x). The 
estimators are computed for the series of length 200, with the additional 30 
observations at the beginning on which initial estimates are based; as an 
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Figure 1: Single realizations and the mean squared errors over 300 replications, 
for t = 5, . . . , 200. 

estimates for and Sr we take the median of the absolute values of the 
data and residuals respectively, divided by 0.6745. The p.d.f. g{x) in (14.131) 
and f l4.14p is replaced by the p.d.f. of iV(0, s^) and the values of the tuning 
constants are c = 1.8, a = 1.8 and (3 = 4. Figure 1 shows single realizations 
and the mean squared errors over 300 replications of the estimators 61'^, 9t 
and Ct for t = 5, . . . , 200. 

Further simulation study is required to study performances of these pro- 
cedures. As this brief simulation suggests, both 6t and Q outperform . 

5 Concluding remarks 

This is a final part of a series of three papers (see Sharia (2006a) and Sharia 
(2006b)). We have introduced estimation procedures (13. ip which are recur- 
sive in the sense that each successive estimator is obtained from the previous 
one by a simple adjustment. To guarantee the convergence one has to im- 
pose global restrictions on the functions in (13. ip (w.r.t. the parameter 9) 
such as a monotonicity type assumption and a restriction on the growth at 
infinity (see Sharia (2006a)). This is the price one has to pay for the nice re- 
cursive structure. Once the convergence is ensured, the rate of convergence 
(see Sharia (2006b)) and asymptotic linearity can be deduced from local (in 
6) conditions. Also, results presented give an explicit way of constructing 
a normalising sequence to ensure local asymptotic linearity. The rest relies 
on the ergodicity of the model. Asymptotic properties such as asymptotic 
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distribution and efficiency of recursive (as well as non-recursive) estimators 
depend on limit theorems possessed by the model. For example, in the 
i.i.d. case (see Corollary 4.1), the central limit theorem and the law of large 
numbers imply that the corresponding recursive procedures are asymptot- 
ically normal and, in addition, the likelihood procedure is asymptotically 
efficient. In general, one can obtain asymptotic distribution and efficiency 
from asymptotic linearity (Lemma 3.1) and an appropriate central limit 
theorem. 

The model considered in the paper is very general as we do not impose 
any preliminary restrictions on probabilistic nature of the observation pro- 
cess and cover a wide class of nonlinear recursive procedures for estimation 
of a multidimensional parameter. The results are new even for the case 
of a scalar parameter and provide a new insight even for the case of i.i.d. 
observations. 

While the advantage of this approach is its universality, verification of 
the conditions may be a nontrivial matter in some models. Examples con- 
sidered give a flavour of what is usually involved in this process and show 
where our restrictions come from. It is worth mentioning, that even in the 
cases where one has difficulties with verifying our conditions, the results of 
the paper can be used to determine the form of a recursive procedure (in 
fact, an algorithm, see Remark 3.2 (iv)-(vi)), which is expected to have the 
same asymptotic properties as the corresponding non-recursive one defined 
as a solution of the equation fll.21) . 
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APPENDIX A 

Proof of Proposition 3.1 To simplify notation we drop the fixed argument 
or the index 6 in some of the expressions below. 

To prove (a), denote 

X. = ^[Ar,(^)A,_i + A,_i)] 

and 

t t 

s=l s=l 

Applying the formula (summation by parts) 

t t 

DsACs - DtCt - Y ^DsCs-u Co = = Do, 

s=l s=l 

with Cs = Ylim=i Xm and Dg = we obtain 

s=l s=l m=l 

Then, AA;i = A'^ - = -A-\A, - A,-,)A:\ = -AA^Aj^A;!,, 
where the last equality follows since As is diagonal. Therefore, 

St = Af E xs + J2 ^^"'^-1 E 

Finally, since ^t's are diagonal with non-decreasing elements, applying the 
Toephts Lemma to the components of the right hand side of latter formula 
we obtain that Qt 0. 

To prove (b) and (c) denote := J2l=i^s- Since i/j e it follows 

from that is a martingale. Denote by M^-'"' the j-th component of Mf. 
Then the square characteristic {M^^^)^ of the martingale M^^^ is 

s=l 

and, by (LL2), Y^"^^^ A{M^^^)J(Ai^^^f < oo. It therefore follows that 
M^^'^ /At^"^ ^ -a.s. (see e.g., Shiryayev (1984), Ch.VII, §5, Theorem 
4). This proves (c). Now, use of the Lenglart-ReboUedo inequality (see, 
e.g., Liptser and Shiryayev (1989), Ch.l, §9) yields 

|(Mi^))2 > [At^^^y^^ < + P' |(M(^))^ > e {At^"^^^ 
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for each > and £ > 0. Then, by (L2), {M^i^)t/{At^^^^f ^ in 
probabiUty . This imphes that M^:^^ /At-"'' — > in probabihty and so, 
since At is diagonal, (2) foUows. <C> 

Proof of Corollary 4.1 Using Corollary 4.1 in Sharia (2006a) it fol- 
lows that (I) and (II) imply {Ot - 0) ^ 0. We have Vt{9) = t-f{e) and 
b{6,u) = J iIj{6 -\- u, z)f{6, z)fi{dz). It is easy to see that (II) implies (B2) 
from Corollary 4.1 in Sharia (2006b), and (V) implies that (Bl) of the same 
Corollary holds with = 1. So, for any < 5 < 1/2, 

(Ai) t\dt -d)^{) 

Let us check that conditions of Lemma 3.1 are also satisfied with At = 
Condition (EE) trivially holds. According to Proposition 3.1, condition (1) 
follows from (LI). To check (LI), it is sufficient to show that 



1 * 



where 

R{9, u) = Rtie, u) = ^{e)^-\e + u) j ^(^ + z)f{e, z)fi{ dz). 

By (V), R{e, u) = --i{e)u + 7(^)7-^ (^ + u)a%u) and 

[7(^^)A,.i+i?(^^, A,_i)]^ = ^s^{eYi-\e+/^s-i)a\A,^i) = v/^||A,_if+^5„ 

where, by (III) and (V), 5, = 7(^)7"'(^ + A,_i)a^(A,_i)/|| A,_i||i+^ ^ 0. 
Then, 

I 5s 



Vi||A,_i||^+^5, = J-^ ({s - l)Wi)||A,_i| 



which, by (Al) (since 1/(2(1 + £)) < 1/2) converges to zero. Therefore, 
(A2) is now a consequence of the Toeplits Lemma. 

For the process ^s(^) from (L2) (since \\u — vW^ < 2||ti|p + 211^11^), we 
have 

11^.(^)11' = h{o)r\e + A,_i) {i;{d + A,_i, Xs) - b{e, a,_i)) - m ^.)ir 

<2||7(^)7-i(^+A,_i)V'(^+A,_i,X,)-V'(^,X,)||V2||7(^)7-^(^+A,_i)6(e,A,_i)|| 

Prom (III) and (V) we obtain that (7(^)7-^ (^ + A,_i) - 1) and 

b{9, Ag-i) — > as s — > 00. So, using (IV), it is easy to see thatE'g | (^Ss^\9)^ | J^s-i 

0. Since {A[-'''\6)y = t, (L2) follows from the Toeplitz lemma. 

Therefore, the conditions of Lemma 3.1 hold for At{9) = \ft. This implies 
that ^t{Qt - 01) ^ in probabihty P^, where 
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The asymptotic normality now obviously follows from the central limit the- 
orem for i.i.d. random variables. <C> 

Corollary Al Suppose that Ht ^ oo and Of is derived by f l4.7l) . Denote 
At = 9t — 6, lt{6) = m{Xt, Xt^i) — 'j{9)h{Xt_i) , and suppose also that 

(I) 

t 

s=l 

where 

. ... _ 7(g + A.-i)-7W , ... 
^'^'^ - 7(^ + A.._,) 

(II) one of the following two conditions are satisfied; 

t 

s=l 

OR 

t 

s=l 

where 

r im 7(9 + A-i) - 1(6 + A.-i) , 

and At is a predictable process with \At\ < \At\. 

Then (14 .Sp holds, i.e., the estimator 6t is asymptotically linear. 
Proof. Let us check the conditions of Lemma 3.1 for ipt{G) = ki^), 

(A3) Ttie) = hie) = mHt 

1 /2 

and AW = Ht' . Since lt{9) is a martingale-difference, we have Eg {m{Xt, Xt^i) \ J^t-i} 
'y{9)h{Xt-i) and so 

(A4) bt{e, u) = Eg {h{e + u) I Tt-i} = h{Xt-i) {i{9) - 7(0 + u)) 
and 

Rt{eM = -^h{Xt^{){m - i{e + u)) = --^h{Xt^,me+u)u 

7(fc' + M) 7(6^ + m) 

where \u\ < \u\. Then, since Ar((^^) = AIt{9) = h{Xt-i)^{9) we have 

AVt{e)u + Rt{e,u) = hiXt-^mo f^^^^l'^^^^^^^ u. 

7(t^ + uj 
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Now, since Aifj = h{Xt-i), it is easy to sec that the first condition in (II) 
imphes (1) in Lemma 3.1 and the second condition in (II) implies (LI) in 
Proposition 3.1. Therefore, (1) holds. 

To verify (2), consider the process Ss{9) defined in (2). Using (A3) and 
( A4) , it is easy to see that 

^sie) = (i - ^^f^^^) {m{Xs,Xs-,)-mKXs-,)) 



^^^^ " 7(^ + A._0 

This shows that (I) implies (2). () 

Proof of Proposition 4.3 Since, by (iii), 7(-) is obviously a continuous 
function, condition (M2) of Proposition 4.1 in Sharia (2006b) holds. Also, 
(Ml) in the same proposition obviously follows from (i). So, it follows that 
all the conditions of Proposition 4.1 and Corollary 4.2 in Sharia (2006b) are 
satisfied implying that HfiSt-e) (P^-a.s.). Also, by (i), AHt/Ht-i = 
h{Xt-i)/Ht-i ^ implying that Ht/Ht-i = 1 + AHt/Ht-i ^ 1. So, 

(^6) HfAt^, = Hf{§t^, -0)^0. 

To estabhsh asymptotic linearity, let us verify the conditions of Corollary 

Al is satisfied. Since A^-i = — — > (P^-a.s.) and |As_i| < |As_i|, 
by (iii) we obtain that 17(6* + As_i) -7(6* + A^-i)! < 2K0\As-i\^'^ eventually 
So, 



7(e + A,^i) - 7(9 + A.-i) 

eventually. Now, 

1 

by (A6) 

^^^^^ 2(1+60) ^ I' since 7(-) we obtain that \HiCs{9)\ — > 0. 
Therefore, by the Toeplits Lemma, the second condition of (II) holds. 

Now, since Ss{9) is a martingale-difference, to verify (I), it is sufficient 
to show that (see e.g., Shiryayev (1984), Ch.VII, §5, Theorem 4) 



E 



Eo{£m\^s-i} 



s=l * 



< 00. 



Since Eq{1'^{9) \ J-^s-i} = l{9)h{Xs-i) = j{9)AHg, the above series can be 
rewritten as 

00 A rr / ■■ //I , A \ •■ \ 2 



s 
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where, by (iii), 



Now, using (A6) and continuity of 7(-) we deduce that 0. Also, 



(see Sharia (2006b), Appendix A, Proposition A2), implying that the above 
series converge which completes the proof. <C> 
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